Shotgun Metagenomic Data Analysis    ◾    323

The “-i” option specifies the input contigs FASTA file, “-a” option specifies the depth

file that  contains contig depth averages and variances, “-o” specifies the output path

and ­prefix, “-v” for verbose, and “--seed” specifies a seed integer to replicate the same

results.

Up to this point, we have performed the taxonomic binning successfully and now we

have separated genomes for each potential species in the metagenomic sample. However,

we do not know the qualities of these genomes and to which microbial species they belong.

So, the next step, we must evaluate these genomic sequences and assess their completeness

with regard to protein-coding genes and their annotations.

8.2.8  Bin Evaluation

The binning quality is usually assessed with CheckM [11], which includes a collection of

tools for assessing the quality of the genome sequence separated from metagenomes and

also to assess the quality of genomes recovered from single cells and isolates. CheckM pro-

vides estimates of genome completeness and contamination in addition to plots and other

important reports. For this software installation, visit “https://github.com/Ecogenomics/

CheckM/wiki”. On Linux, it requires HMMER, Prodigal, and Pplacer programs to be

installed and added to the system path.

sudo apt update

sudo apt install hmmer

sudo apt install prodigal

You need to follow the installation instructions at Pplacer home page, which is available

at “https://matsen.fhcrc.org/pplacer/”, and add it to the Linux path. Then, you can install

CheckM with the following commands:

pip3 install numpy

pip3 install matplotlib

pip3 install pysam

pip3 install checkm-genome

You can also install CheckM on Anaconda using:

conda install -c bioconda checkm-genome

conda install -c bioconda/label/cf201901 checkm-genome

Now, we can run CheckM commands to assess the completeness and contamination of

the genome bins by using lineage-specific marker sets. This workflow consists of several

steps that include placing bins in the reference genome tree, assessing phylogenetic mark-

ers found in each bin, and inferring lineage-specific marker sets for each bin. These steps

are done with multiple CheckM commands but they can also be done in a single step by

using “lineage_wf” command.